between the k mer patterns from multiple sequences was
d in a different way. First, a F statistic was calculated using the
g equation,
ܨൌminሺܰଵሺߠሻ, ܰଶሺߠሻሻ
minሺܮଵ, ܮଶሻെ݇1
ఏ∈
(7.16)
was a set of all k-mers, ߠ was one of k-mer, ܰଵሺߠሻ and ܰଶሺߠሻ
r the frequency of ߠ in two sequences and ܮ1 as well as ܮ2
ed lengths of two sequences. The distance between two
s was then calculated as
݀ൌlogሺ0.1 ܨሻെlogሺ1.1ሻ
logሺ0.1ሻ
(7.17)
ode shown below was used to apply the kmer package to analyse
sequences:
y(ape)
y(kmer)
y(insect)
y(Biostrings)
FASTA('SARS.HIV.fasta')
luster(x),horiz=TRUE)
7.12. The tree generated by the kmer package for 17 genome sequences.
e 7.12 shows the tree generated by kmer for these 17 genome
s. It shows the same pattern as seen in Figure 7.10 and Figure